An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model

نویسندگان

  • Takashi Nose
  • Takao Kobayashi
چکیده

To control intuitively the intensities of emotional expressions and speaking styles for synthetic speech, we introduce subjective style intensities and multiple-regression global variance (MRGV) models into hidden Markov model (HMM)-based expressive speech synthesis. A problem in the conventional parametric style modeling and style control techniques is that the intensities of styles appearing in synthetic speech strongly depend on the training data. To alleviate this problem, the proposed technique explicitly takes into account subjective style intensities perceived for respective training utterances using multiple-regression hidden semi-Markov models (MRHSMMs). As a result, synthetic speech becomes less sensitive to the variation of style expressivity existing in the training data. Another problem is that the synthetic speech generally suffers from the over-smoothing effect of model parameters in the model training, so the variance of the generated speech parameter trajectory becomes smaller than that of the natural speech. To alleviate this problem for the case of style control, we extend the conventional variance compensation method based on a GV model for a single-style speech to the case of multiple styles with variable style intensities by deriving the MRGV modeling. The objective and subjective experimental results show that these two techniques significantly enhance the intuitive style control of synthetic speech, which is essential for the speech synthesis system to communicate para-linguistic information correctly to the listeners. 2012 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A style control technique for speech synthesis using multiple regression HSMM

This paper presents a technique for controlling intuitively the degree or intensity of speaking styles and emotional expressions of synthetic speech. The conventional style control technique based on multiple regression HMM (MRHMM) has a problem that it is difficult to control phone duration of synthetic speech because HMM has no explicit parameter which models phone duration appropriately. To ...

متن کامل

A style control technique for singing voice synthesis based on multiple-regression HSMM

This paper proposes a technique for controlling singing style in the HMM-based singing voice synthesis. A style control technique based on multiple regression HSMM (MRHSMM), which was originally proposed for the HMM-based expressive speech synthesis, is applied to the conventional technique. The idea of pitch adaptive training is introduced into the MRHSMM to improve the modeling accuracy of fu...

متن کامل

A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM

This paper describes a technique for modeling and controlling emotional expressivity of speech in HMM-based speech synthesis. A problem of conventional emotional speech synthesis based on HMM is that the intensity of an emotional expression appearing in synthetic speech completely depends on the database used for model training. To take into account the emotional expressivity that listeners act...

متن کامل

A style control technique for HMM-based speech synthesis

This paper describes an approach to controlling style of synthetic speech in HMM-based speech synthesis. The style is defined as one of speaking styles and emotional expressions in speech. We model each speech synthesis unit by using a context-dependent HMM whose mean vector of the output distribution function is given by a function of a parameter vector called style control vector. We assume t...

متن کامل

Recent Development of HMM-Based Expressive Speech Synthesis and Its Applications

This paper describes the recent development of HMM-based expressive speech synthesis. Although the expressive speech includes a wide variety of expressions such as emotions, speaking styles, intention, attitude, emphasis, focus, and so on, we mainly refer to the speech synthesis techniques for emotions and speaking styles, which would be the most primary expressions in human speech communicatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 55  شماره 

صفحات  -

تاریخ انتشار 2013